Estudio de un Conjunto de Datos que Representa Eventos Infrecuentes y la Viabilidad de la Distribución Poisson Para su Modelado

El presente trabajo se propone estudiar un conjunto de datos con los incidentes atendidos por el departamento de bomberos de una gran urbe contemporánea para ver hasta qué punto estos se adecuan a la idea inicial que podríamos de tener. A saber, que en su mayoría son eventos excepcionales y por tanto es tentador pensar que son bien modelados por la distribución del títutlo.

Siguiente el consejo de la docente, añadimos también el estudio a través de la distribución binomial negativa, que en su concepción heurística también se plantea como ideal para modelar esta clase de sucesos.

Preparamos los Datos para su Estudio

Cargamos algunas librerías útiles:

library(MASS, pos = "package:base") # WARNING: Import before tidyverse to keep dplyr's `select`
library(tidyverse)
library(lubridate)
library(glue)
library(here)
library(knitr)

Leemos el dataset, bajado de acá:

read_dataset <- function(fn) {
  read_csv(fn) %>%
  rename(
    incident_type = CFD_INCIDENT_TYPE_GROUP,
    incident_time = CREATE_TIME_INCIDENT) %>%
    # Reemplazamos NA con "NULL" para poder seleccionar los incident_type con
    # valor faltante más tarde:
    mutate(incident_type = map_chr(incident_type,
                                   ~ifelse(is.na(.x), "NULL", .x)))  
}

count_incidents_per_type <- function(df) {
  df %>% count(incident_type) %>% arrange(-n)
}

fn <- here("data/Cincinnati_Fire_Incidents__CAD___including_EMS__ALS_BLS_.csv")
full_incidents <- read_dataset(fn)
## Parsed with column specification:
## cols(
##   ADDRESS_X = col_character(),
##   LATITUDE_X = col_double(),
##   LONGITUDE_X = col_double(),
##   AGENCY = col_character(),
##   CREATE_TIME_INCIDENT = col_character(),
##   DISPOSITION_TEXT = col_character(),
##   EVENT_NUMBER = col_character(),
##   INCIDENT_TYPE_ID = col_character(),
##   INCIDENT_TYPE_DESC = col_character(),
##   NEIGHBORHOOD = col_character(),
##   ARRIVAL_TIME_PRIMARY_UNIT = col_character(),
##   BEAT = col_character(),
##   CLOSED_TIME_INCIDENT = col_character(),
##   DISPATCH_TIME_PRIMARY_UNIT = col_character(),
##   CFD_INCIDENT_TYPE = col_character(),
##   CFD_INCIDENT_TYPE_GROUP = col_character(),
##   COMMUNITY_COUNCIL_NEIGHBORHOOD = col_character()
## )
n_per_incident_type <- count_incidents_per_type(full_incidents)

# Uncomment to print the list of incident types and their frequency:
# kable(n_per_incident_type) 

Lo que nos interesa es separar las llamadas según su tipo y ver la frecuencia de cada tipo, para eso por lo pronto las contamos, considerando que el período de los registros abarca unos tres años.

incident_type n
SICK PERSON 30798
BREATHING PROBLEMS 24009
FALLS 22481
PERSON DOWN 19529
AUTOMATIC FIRE ALARM 19507
MEDICAL EMERGENCY 14627
CHEST PAIN / CHEST DISCOMFORT (NON-TRAUMATIC) 14327
INFORMATION TELETYPE-NO DISPATCH 12682
UNCONSCIOUS / FAINTING (NEAR) 10468
CONVULSIONS / SEIZURES 9019
UNKNOWN PROBLEM (PERSON DOWN) 8358
ABDOMINAL PAIN / PROBLEMS 7462
ACCIDENT WITH INJURY - FIRE ONLY 7324
HEROIN OVERDOSE 7290
HEMORRHAGE / LACERATIONS 7150
ASSAULT WITH INJURY 6224
DIABETIC PROBLEMS 5205
ACCIDENT WITH INJURY 4329
STRUCTURE FIRE 4317
NULL 4054
OUTDOOR FIRE 3792
DETAIL/SPECIAL ASSIGNMENT 3528
STROKE (CVA) / TRANSIENT ISCHEMIC ATTACK (TIA) 3313
PREGNANCY / CHILDBIRTH / MISCARRIAGE 3200
TRAUMATIC INJURIES (SPECIFIC) 3095
HEART PROBLEM / A.I.C.D 2709
OVERDOSE / POSIONING (INGESTION) 2620
SUICIDE ATTEMPT 2611
FUMES 2526
SALVAGE 2514

Preparamos una Función que Analiza la Muestra para el Tipo de Evento Seleccionado Según Ambas Distribuciones

keep_incidents_of_type <- function(chosen_type) {
  full_incidents %>%
    select(incident_type, incident_time) %>%
    filter(incident_type == chosen_type) %>%
    select(incident_time) %>%
    mutate(incident_time = as.POSIXct(incident_time,
                                      format = "%m/%d/%Y %I:%M:%S %p",
                                      tz = "UTC")) %>%
    mutate(incident_day = floor_date(incident_time, "day"))
}

compare_poisson_and_bn_for_incident_type <- function(chosen_incident_type) {
  chosen_incidents <- keep_incidents_of_type(chosen_incident_type)

  daily_incidents_count <-
    chosen_incidents %>%
    count(incident_day) %>%
    select(n)

  # Agregamos los días sin llamadas (días que no aparecen en el dataset)
  # como filas de ceros:
  max_day <- max(chosen_incidents$incident_day)
  min_day <- min(chosen_incidents$incident_day)
  total_days <- max(as.integer(max_day - min_day), nrow(daily_incidents_count))
  missing_days <- total_days - nrow(daily_incidents_count)
  
  daily_incidents_count <-
    rbind(tibble(n = rep(0, missing_days)), daily_incidents_count) %>%
    mutate(n = as.integer(n))

  # Frecuencia de frecuencias: contamos cuántos días tienen 0 llamadas,
  # cuántos días tienen 1 llamada, ...
  daily_incidents_freq <-
    daily_incidents_count %>%
      count(n) %>%
      rename(count = nn) %>%
      mutate(
        density = count / nrow(daily_incidents_count),
        density_type = "empirical") %>%
      select(n, density_type, density)

  # Safety check:
  assertthat::assert_that((near(sum(daily_incidents_freq$density), 1)))
  
  # Asumiendo distribución Poisson, estimamos el parámetro como la media muestral:
  lambda_estimate <- mean(daily_incidents_count$n)

  # Ajustamos el número de incidentes diario con la distribución Binomial Negativa
  # para comparar con la Poisson:
  fit <- fitdistr(daily_incidents_count$n, densfun = "negative binomial")
  bn_size_estimate <- fit$estimate[["size"]]
  # Obs: El parámetro "mu" de la parametrización alternativa de la distribución
  # BN se estima igual que el lambda de Poisson, con la media muestral.
  bn_mu_estimate <- lambda_estimate

  max_n <- max(daily_incidents_count$n)
  min_n <- min(daily_incidents_count$n)

  # Juntamos las densidades esperadas por Poisson y BN para cada n observado
  # para luego comparar con la densidad empírica:
  compute_density <- function(n, density_type, density_function) {
    if (density_type == "poisson") {
      density_function(n, lambda = lambda_estimate)
    } else if (density_type == "negative_binomial") {
      density_function(n, size = bn_size_estimate, mu = bn_mu_estimate)
    } else {
      stop(glue("I don't know how to compute density for type: {density_type}"))
    }
  }
  
  probabilities_per_n <-
    cross_df(list(
    n = seq(min_n, max_n),
    density_type = c("poisson", "negative_binomial"))) %>%
    mutate(
      density_function = map(density_type,
                             ~ifelse(.x == "poisson", dpois, dnbinom)),
      density = pmap_dbl(list(n, density_type, density_function),
                         compute_density)) %>%
    select(n, density_type, density)
  
  probabilities_per_n <- rbind(probabilities_per_n, daily_incidents_freq)
  
  # Sumamos las curvas de densidad Poisson y BN al gráfico de la empírica:
  poisson_color <- "ForestGreen"
  bn_color <- "IndianRed"
  empirical_color <- "#444444"
  
  fig <-
    probabilities_per_n %>%
    ggplot() +
    aes(x = n, y = density, color = density_type) +
    geom_point(alpha = .7) +
    geom_line() +
    scale_color_manual(values = c(empirical_color, bn_color, poisson_color)) +
    labs(
      title = glue("Número de '{chosen_incident_type}' por día"),
      subtitle = "Ajuste de la distribución empírica con Poisson y Binomial Negativa",
      y = "Densidad",
      x = glue("Número de llamadas"))
  
  return (list(
    probabilities_per_n = probabilities_per_n,
    fig = fig
  ))
}

Corremos la comparación para todos los tipos de incidentes del dataset:

for (incident_type in n_per_incident_type$incident_type) {
  print(glue("Working with: {incident_type}"))
  comparison <- compare_poisson_and_bn_for_incident_type(incident_type)
  
  # Sanitize the incident type for the plot filename:
  name <- str_to_lower(incident_type)
  name <- str_replace_all(name, " ", "_")
  name <- str_replace_all(name, "/", "-")
  
  filename <- here(glue("results/{name}.png"))
  ggsave(filename, comparison$fig, width = 10, height = 6)
  
  print(comparison$fig)
}
## Working with: SICK PERSON
## Working with: BREATHING PROBLEMS

## Working with: FALLS

## Working with: PERSON DOWN

## Working with: AUTOMATIC FIRE ALARM

## Working with: MEDICAL EMERGENCY

## Working with: CHEST PAIN / CHEST DISCOMFORT (NON-TRAUMATIC)

## Working with: INFORMATION TELETYPE-NO DISPATCH

## Working with: UNCONSCIOUS / FAINTING (NEAR)

## Working with: CONVULSIONS / SEIZURES

## Working with: UNKNOWN PROBLEM (PERSON DOWN)

## Working with: ABDOMINAL PAIN / PROBLEMS

## Working with: ACCIDENT WITH INJURY - FIRE ONLY

## Working with: HEROIN OVERDOSE

## Working with: HEMORRHAGE / LACERATIONS

## Working with: ASSAULT WITH INJURY

## Working with: DIABETIC PROBLEMS

## Working with: ACCIDENT WITH INJURY
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: STRUCTURE FIRE

## Working with: NULL
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: OUTDOOR FIRE

## Working with: DETAIL/SPECIAL ASSIGNMENT

## Working with: STROKE (CVA) / TRANSIENT ISCHEMIC ATTACK (TIA)

## Working with: PREGNANCY / CHILDBIRTH / MISCARRIAGE

## Working with: TRAUMATIC INJURIES (SPECIFIC)

## Working with: HEART PROBLEM / A.I.C.D

## Working with: OVERDOSE / POSIONING (INGESTION)

## Working with: SUICIDE ATTEMPT

## Working with: FUMES
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: SALVAGE
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: FIRE ADVISED - NO DISPATCH

## Working with: WIRES DOWN/ARCING
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: BACK PAIN (NON-TRAUMATIC OR NON-RECENT TRAUMA)

## Working with: FIRE DRILL - NO RESPONSE

## Working with: SHOOTING HAS OCCURRED

## Working with: VEHICLE FIRE

## Working with: LOCK IN/LOCK OUT

## Working with: CARDIAC OR RESPIRATORY ARREST / DEATH
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: HEADACHE

## Working with: WALKIN AT FIRE STATIOIN

## Working with: ALLERGIES (REASTIONS) / ENVENOMATIONS (STINGS, BITES)

## Working with: CARBON MONOXIDE ALARM
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: ENTRAPMENT - AUTO ACCIDENT
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: STUCK IN ELEVATOR

## Working with: STILL ALARM

## Working with: POSSIBLE DOA

## Working with: CHEST PAIN/CHEST DISCOMFORT (NON-TRAUMATIC)
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: INVESTIGATION
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: DOMESTIC VIOLENCE WITH INJURY

## Working with: FIRE HYDRANT LEAKING OR STRUCK

## Working with: TRAFFIC / TRANSPORTATION INCIDENTS
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: NOT USED

## Working with: AUTO ACCIDENT - CAR HIT BUILDING

## Working with: CUTTING - FIRE ONLY
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: ROBBERY WITH INJURY

## Working with: CHOKING

## Working with: TASING - POLICE REQUEST

## Working with: CUTTING
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: ASSAULT / SEXUAL ASSAULT / STUN GUN
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: GAS SPILL - MINOR < 25 GALLONS

## Working with: ANIMAL BITES / ATTACKS

## Working with: EYE PROBLEMS / INJURIES
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: FIRE REPORTED OUT

## Working with: SWAT OPERATION
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: BURNS (SCALDS) / EXPLOSION (BLAST)

## Working with: OUTLET SPARKING

## Working with: FIRE EQUIPMENT INVOLVED IN ACCIDENT

## Working with: POLICE OFFICER NEEDS ASSISTANCE

## Working with: CARBON MONOXIDE ILL/WITH SYMPTOMS

## Working with: HEAT / COLD EXPOSURE
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: BOMB REMOVAL

## Working with: RIVER EMERGENCY
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: TRUCK FIRE
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: ENTRAPMENT - MECHANICAL/FIRE ONLY
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: MUTUAL AID REQEST
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: PSYCHIATRIC / ABNORMAL BEHAVIOR / SUICIDE ATTEMPT

## Working with: STAB / GUNSHOT / PENETRATING TRAUMA
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: TEST

## Working with: INACTIVITY ALARM

## Working with: RAPE WITH INJURY
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: INJURY - POLICE REQUEST
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: ELECTROCUTION / LIGHTNING

## Working with: FIREFIGHTER NEEDS ASSISTANCE
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: RAPE

## Working with: CHECMICAL INVESTIGATION

## Working with: STRUCTURAL OR TRENCH COLLAPSE
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: AIR CRAFT IN TROUBLE OR DOWN
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: GAS SPILL - MAJOR > 25 GALLONS

## Working with: BIOLOGICAL/CHEMCIAL THREAT
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: DROWNING - FIRE ONLY

## Working with: DROWNING - LARGE BODY OF WATER
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: INACCESSIBLE INCIDENT / OTHER ENTRAPMENTS (NON-TRAFFIC)

## Working with: DROWNING / NEAR DROWNING / DIVING / SCUBA ACCIDENT
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: CHEMICAL SPILL

## Working with: HIGH RISK SEARCH WARRANT
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: CARBON MONOXIDE /INHALATION / HAZMAT / CBRN

## Working with: TRANSFER CALL

## Working with: WATER RESCUE TRIBUTARY
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: GREATER CINCINNATI AIRPORT CRASH
## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Warning in densfun(x, parm[1], parm[2], ...): NaNs produced

## Working with: CARDIAC OR RESPIRATORY ARREST/DEATH

## Working with: BOAT FIRE ON RIVER

## Working with: FIRE SERVICE - NO DISPATCH/ NOT USED

## Working with: BIOHAZARD ALARM AT POST OFFICE

## Working with: HEMORRHAGE/LACERATIONS